Search CORE

5 research outputs found

Unsupervised Machine Learning for Explainable Medicare Fraud Detection

Author: Akoglu Leman
Leder-Luis Jetson
Shekhar Shubhranshu
Publication venue
Publication date: 09/11/2022
Field of study

The US federal government spends more than a trillion dollars per year on health care, largely provided by private third parties and reimbursed by the government. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this paper, we develop novel machine learning tools to identify providers that overbill Medicare, the US federal health insurance program for elderly adults and the disabled. Using large-scale Medicare claims data, we identify patterns consistent with fraud or overbilling among inpatient hospitalizations. Our proposed approach for Medicare fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing reasoning and interpretable insights into the potentially suspicious behavior of the flagged providers. Data from the Department of Justice on providers facing anti-fraud lawsuits and several case studies validate our approach and findings both quantitatively and qualitatively.Comment: Working pape

arXiv.org e-Print Archive

Discovery and Exploitation of Generalized Network Effects

Author: Faloutsos Christos
Lee Meng-Chieh
Shekhar Shubhranshu
Yoo Jaemin
Publication venue
Publication date: 28/08/2023
Field of study

Given a large graph with few node labels, how can we (a) identify whether there is generalized network-effects (GNE) of the graph or not, (b) estimate GNE to explain the interrelations among node classes, and (c) exploit GNE to improve downstream tasks such as predicting the unknown labels accurately and efficiently? The knowledge of GNE is valuable for various tasks like node classification and targeted advertising. However, identifying and understanding GNE such as homophily, heterophily or their combination is challenging in real-world graphs due to limited availability of node labels and noisy edges. We propose NetEffect, a graph mining approach to address the above issues, enjoying the following properties: (i) Principled: a statistical test to determine the presence of GNE in a graph with few node labels; (ii) General and Explainable: a closed-form solution to estimate the specific type of GNE observed; and (iii) Accurate and Scalable: the integration of GNE for accurate and fast node classification. Applied on public, real-world graphs, NetEffect discovers the unexpected absence of GNE in numerous graphs, which previously thought to exhibit heterophily. Further, we show that incorporating GNE is effective on node classification. On a large real-world graph with 1.6M nodes and 22.3M edges, NetEffect achieves over 7 times speedup (14 minutes vs. 2 hours) compared to most competitors.Comment: Under Submissio

arXiv.org e-Print Archive

Benefit-aware Early Prediction of Health Outcomes on Multivariate EEG Time Series

Author: Akoglu Leman
Elmer Jonathan
Eswaran Dhivya
Faloutsos Christos
Hooi Bryan
Shekhar Shubhranshu
Publication venue
Publication date: 10/11/2021
Field of study

Given a cardiac-arrest patient being monitored in the ICU (intensive care unit) for brain activity, how can we predict their health outcomes as early as possible? Early decision-making is critical in many applications, e.g. monitoring patients may assist in early intervention and improved care. On the other hand, early prediction on EEG data poses several challenges: (i) earliness-accuracy trade-off; observing more data often increases accuracy but sacrifices earliness, (ii) large-scale (for training) and streaming (online decision-making) data processing, and (iii) multi-variate (due to multiple electrodes) and multi-length (due to varying length of stay of patients) time series. Motivated by this real-world application, we present BeneFitter that infuses the incurred savings from an early prediction as well as the cost from misclassification into a unified domain-specific target called benefit. Unifying these two quantities allows us to directly estimate a single target (i.e. benefit), and importantly, dictates exactly when to output a prediction: when benefit estimate becomes positive. BeneFitter (a) is efficient and fast, with training time linear in the number of input sequences, and can operate in real-time for decision-making, (b) can handle multi-variate and variable-length time-series, suitable for patient data, and (c) is effective, providing up to 2x time-savings with equal or better accuracy as compared to competitors.Comment: arxiv submissio

arXiv.org e-Print Archive

Is There Any Link Between Death of Preceding Child and Child Health Care Services Utilization for Subsequent Birth?

Author: B Ali
JO Akinyemi
M Ranjan
Manoj Alagarajan
MM Rahman
PK Singh
R Prakash
Ratna Patel
S Vellakkal
Shekhar Chauhan
Shobhit Srivastava
Shubhranshu Kumar Upadhyay
V Deshmukh
Y Krishnamoorthy
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref